1 Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
2 Howard Hughes Medical Institute, Seattle, WA 98195, USA
3 Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
* Correspondence to Vikram Agarwal (vagar@uw.edu) and Jay Shendure (shendure@uw.edu)
Recorded information governing transcirption makes creating an accurate model complicated
Deep learning provides an opportunity to achieve state-of-the-art prediction performance despite incomplete biological information
Thereforre, the researchers have created Xpresso, a deep learning model generated to try and predict mRNA abundance from promoter sequence composition and mRNA degredation features
| Source: https://www.sumologic.com/wp-content/uploads/performances_vs_data.png | Source: Nauman et al. 2017 |
|---|---|
![]() |
![]() |
The building block of a neural network is called a perceptron
A neural netowk (NN) is a series of mulitlayered perceptrons that seek to predict an output. A deep NN has many more layers than a regular NN and is better able to learn from more data
For biologists, most deep learning is implemented in the form of deep convolutional neural networks






# Figure 1
get_fig_from_page(34, left=True, full_w=True, top=1075, height=2150, shrink=1)
# Figure 2
get_fig_from_page(36, left=True, full_w=True, top=925, height=2900, shrink=1)
# Figure 3A - F
get_fig_from_page(38, left=True, full_w=True, top=900, height=2600, shrink=1)
# Figure 3G - H
get_fig_from_page(38, left=True, full_w=True, top=3500, height=1000, shrink=1)
# Figure 4
get_fig_from_page(41, left=True, full_w=True, top=800, height=2150, shrink=1)
# Figure 5
get_fig_from_page(43, left=True, full_w=True, top=800, height=1850, shrink=1)
# Figure 6
get_fig_from_page(44, left=True, full_w=True, top=900, height=2650, shrink=1)
Xpresso can explain up to 59% of the variation for human gene expression and 71% for mouse gene expression from promoter sequence input and mRNA half-life proxy alone
Xpresso outperforms other transcriptional models even when those models incorperate CHIP, DNase, TF CHIP, or MPRA data
Although cell-type specific models showed a higher r2, the cell-type agnostic models showed comperable results to the cell-type specific models
The most relevent areas that the network uses for expression predictions reside near the transcription start site
CpG islands within the proximal promoter are the most influential motif found for expression prediction
What do you think could be done to improve Xpresso?
What aditional analysis do you wish they had performed to validate Xpresso?
Do you agree that Xpresso is acurately predicting real world transciprtion efficiencies, or just looking for CG motifs near the transcription start site to inform its decision?